Comprehensive analysis revealed the immunoinflammatory targets of rheumatoid arthritis based on intestinal flora, miRNA, transcription factors, and RNA-binding proteins databases, GSEA and GSVA pathway observations, and immunoinfiltration typing

Objective Rheumatoid arthritis (RA) is a chronic inflammatory arthritis. This study aimed to identify potential biomarkers and possible pathogenesis of RA using various bioinformatics analysis tools. Methods The GMrepo database provided a visual representation of the analysis of intestinal flora. We selected the GSE55235 and GSE55457 datasets from the Gene Expression Omnibus database to identify differentially expressed genes (DEGs) separately. With the intersection of these DEGs with the target genes associated with RA found in the GeneCards database, we obtained the DEGs targeted by RA (DERATGs). Subsequently, Disease Ontology, Gene Ontology, and the Kyoto Encyclopedia of Genes and Genomes were used to analyze DERATGs functionally. Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) were performed on the data from the gene expression matrix. Additionally, the protein-protein interaction network, transcription factor (TF)-targets, target-drug, microRNA (miRNA)-mRNA networks, and RNA-binding proteins (RBPs)-DERATGs correlation analyses were built. The CIBERSORT was used to evaluate the inflammatory immune state. The single-sample GSEA (ssGSEA) algorithm and differential analysis of DERATGs were used among the infiltration degree subtypes. Results There were some correlations between the abundance of gut flora and the prevalence of RA. A total of 54 DERATGs were identified, mainly related to immune and inflammatory responses and immunodeficiency diseases. Through GSEA and GSVA analysis, we found pathway alterations related to metabolic regulations, autoimmune diseases, and immunodeficiency-related disorders. We obtained 20 hub genes and 2 subnetworks. Additionally, we found that 39 TFs, 174 drugs, 2310 miRNAs, and several RBPs were related to DERATGs. Mast, plasma, and naive B cells differed during immune infiltration. We discovered DERATGs’ differences among subtypes using the ssGSEA algorithm and subtype grouping. Conclusions The findings of this study could help with RA diagnosis, prognosis, and targeted molecular treatment. Supplementary Information The online version contains supplementary material available at 10.1186/s41065-024-00310-6.


Introduction
A systemic autoimmune disease called rheumatoid arthritis (RA) is characterized by chronic inflammation that can damage joints and extra-articular organs [1].It deteriorates intermittently, and without proper therapy, the symptoms worsen over time until the joints are irreparably damaged, leading to additional physical and psychological issues [2].Therefore, managing and preventing RA requires early identification, diagnosis, and management.Some studies have shown that successful early intervention can significantly lower the financial burden of RA [3,4].However, the early onset of RA is typically misleading and challenging to identify at first [5].Rheumatoid factors, anti-citrullinated protein antibodies (ACPAs), erythrocyte sedimentation rate, and C-reactive protein are the only four biomarkers currently used to identify RA, and each has some limitations [6].Conventional, biological, and novel abiotic disease-modifying antirheumatic drugs are also recognized treatment options.A composite score is also used to quantify disease activity.While most patients respond to the available treatments and experience remission, many do not or are resistant [7].Therefore, it is crucial to thoroughly comprehend the evolving mechanism of RA, search for novel signs that might be used for clinical diagnosis or identification of RA conditions, and design more efficient medication treatment targets.Based on the issues mentioned above and their significance, we suggest the following scientific questions: Several mechanisms can be identified and diagnosed in RA, and some genetic characteristics could serve as new targets for clinical treatment with current drugs.
We used various bioinformatics analytical methods to examine biomarkers and the inflammatory status of RA, including R packages from Bioconductor; the databases Gene Expression Omnibus (GEO), GMrepo, GeneCards, STRING, PharmGKB, DrugCentral, Tar-getScan, RNA-Binding Protein DataBase (RBPDB); the Cytoscape software; and the CIBERSORT website.This study provided a comprehensive reference for the current RA treatment conundrum by thoroughly explaining every aspect of the pathological molecular mechanism of RA and thoroughly analyzing the druggable targets that can be used for clinical diagnosis and treatment based on the mined key gene targets.

The intestinal flora analysis
The GMrepo database (https:// gmrepo.human gut.info/ home) was used to retrieve relevant intestinal microbiotas of RA [8].A correlation map was constructed between the relative abundance of gut microbiota and the prevalence of RA.A species co-occurrence network map was constructed by analyzing the relationships between species or genera of gut microbiota in patients with RA.

Data download and preprocessing
The GEO database (https:// www.ncbi.nlm.nih.gov/ geo/) was used to download microarray datasets GSE55235 [9] and GSE55457 [9] using the R package (GEOquery) [10].Additionally, all dataset samples were generated from Homo sapiens using the GPL96 [HG-U133A] Affymetrix Human Genome U133A Array platform.The GSE55235 dataset contained 10 samples from patients with RA and 10 samples from healthy volunteers.In contrast, the GSE55457 dataset contained 13 samples from patients with RA and 10 samples from healthy volunteers, which were used in this study.RMA algorithm from the Affy package in R was used to normalize the data [11].With the RNASeqSampleSize package, statistical power analysis of the data is done [12].

Differentially Expressed Genes (DEGs) screening and functional analysis
The DEGs for the two datasets, GSE55235 and GSE55457, were identified using limma [13].The DEGs were then displayed using the R program as Volcano plots and Heat maps using the ggplot2 and pheatmap packages, respectively.| log2 of the Fold Change (log2FC)|> 1 and adjusted P-value < 0.05 were used to recruit DEGs.The GeneCards database (http:// www.genec ards.org/) was used to find the RA target genes (RATGs) [14] by searching the keyword "rheumatoid arthritis." Next, the DEGs targeted by RA (DERATGs) were filtered by overlapping the DEGs and RATGs using a Venn diagram.Subsequently, the clusterProfiler package [15] was used to handle Gene Ontology (GO) function, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments on DERATGs and Disease Ontology (DO) enrichments were performed for DERATGs using the DOSE package [16].Adjusted P-value < 0.05 was considered statistically different.Meanwhile, the Gene Set Enrichment Analysis (GSEA) was performed on all RA genes (previously ranked based on their log2FC between analyzed groups) using the clusterProfiler package.It was thought that the enrichment was significant if the nominal false discovery rate (FDR) < 0.25 and P-value < 0.05 by referencing the "c2.cp.kegg.v7.5.1.symbols.gmt"gene set.
Utilizing the gene set variation analysis (GSVA) package [17], RA gene expression matrix data were subjected to GSVA.The differential pathways were filtered according to adjusted P-value < 0.05 and |log2FC|> 0.263.

Construction of the Protein-Protein Interaction (PPI) network
The PPI network of the DERATGs was analyzed using the interaction relation in the database STRING (https:// string-db.org/) [18].Network node attributes were calculated using NetworkAnalyzer in Cytoscape [19].Cytoscape's cytoHubba plugin [20] predicted important nodes (or hub proteins).The subnetworks were extracted from the whole PPI network using the MCODE [20].

Analysis of immune cell infiltration
CIBERSORT deconvolves the transcriptome expression matrix to determine the make-up and number of immune cells within a mixed cell population using linear support vector regression [26].We entered the gene expression matrix data into CIBERSORT, filtered samples with a P-value < 0.05, and created the immune cell infiltration matrix.The ggheatmap package generated heat maps depicting the 22 immune cells during each sample.Boxplots were created using the ggplot2 and ggpubr packages to investigate differences in immune infiltrating cells between groups, with P < 0.05 as the screening standard.

Subgroup evaluation
Twenty-eight RA samples were given enrichment scores using single-sample GSEA (ssGSEA) [27].Following that, depending on immune infiltrating cell expression, we categorized RA into two subtypes (Cluster1 and Cluster2).
We investigated the DEGs of DERATGs between these subtypes in the GSE55235 and the GSE55457 datasets.

Statistical analyses of the intestinal microbiota
A correlation map between the abundance of gut microbiota and the prevalence of RA was created using the GMrepo database (Fig. 3A).The percentage of samples that contained species/genera > 0.01% abundance threshold was counted, and the mean/median relative abundance of species in all RA samples was also summarized.The species co-occurrence network diagram (Fig. 3B) was constructed, with nodes representing species or genera co-occurring with other species or genera in this RA sample.The number of connected network nodes affects the size, and the width of the lines (Pearson's correlation) represents relationships between species or genera characterized by co-occurrence.

Data selection and DERATGs screening
According to the GEO data platform, the analysis data were summarized and sorted (Table 1).Following data comparison, it was determined that the sample sizes for the RA and the normal groups were roughly balanced, and an examination of the statistical power of the sample sizes of the two data sets was conducted (Table 1), laying the groundwork for further analysis.The GSE55235 dataset's gene expression matrix was initially standardized and processed.As shown in the Volcano plot (Fig. 4A) created using R software after data preprocessing, the gene expression matrix yielded 296 upregulated and 248 downregulated genes.The top 10 genes with the most significant differences were identified.Subsequently, we also showed the heat map of DEGs for the GSE55235 dataset (Fig. 4B).Following standardization, the GSE55457 dataset was compared to the normal samples.Differential analysis was conducted to obtain 114 genes that were upregulated and 51 genes that were downregulated.The Volcanic plot was shown (Fig. 4C).Meanwhile, the heat map was also used to show the expression between samples (Fig. 4D).Gen-eCards retrieved the disease targets of RA (Supplement Spreadsheet S1).Differential genes of the two datasets and the target genes of the GeneCards database intersected, and 54 DERATGs were finally screened out (Fig. 4E).

GO, KEGG, and DO enrichment analysis
Then, functional enrichment analyses for GO (Table 2), KEGG (Table 3), and DO (Table 4) were carried out on DERATGs.The GO results confirmed that DERATGs were primarily linked to the cytokine-mediated signaling pathway, clathrin-coated endocytic vesicle membrane, G protein-coupled receptor binding, and other biological phenomena (Fig. 5A-C).The KEGG results showed that the tumor necrosis factor signaling pathway, chemokine signaling pathway, osteoclast differentiation, and other pathways had higher DERATG concentrations than other pathways (Fig. 5D-F).According to DO findings, DERATGs were particularly enriched in myeloma, bone marrow cancer, multiple myeloma, and other diseases (Fig. 5G-I).

GSEA and GSVA analysis
Our reference gene set was "c2.cp.kegg.v7.5.1.symbols.gmt." The two datasets were subjected to a GSEA enrichment analysis to identify significant enrichment according to the FDR criteria < 0.25 and P < 0.05.
(Table 5).The GSEA enrichment analysis revealed that the DERATGs in the GSE55235 dataset exist and are also significantly enriched in the upregulated pathways, such as an intestinal immune network for immunoglobulin (Ig)A production, allograft rejection, autoimmune thyroid disease, etc. (Fig. 6A), and are also significantly enriched in the downregulated pathways, such as ribosome biogenesis in eukaryotes, basal cell carcinoma, mitophagy-animal, and so on (Fig. 6B).Similarly, the GSEA enrichment analysis on the GSE55457 dataset (Fig. 6D, E) revealed very high similarity with the GSE55235 dataset, demonstrating the efficacy of DERATGs and enabling further analysis.GSVA enrichment analysis was performed on the GSE55235 and the GSE55457 datasets, and distinct pathways were displayed (Table 6).The differential    pathways in the GSE55235 dataset included autoimmune thyroid disease, the intestinal immune network for IgA production, viral myocarditis, and so on (Fig. 6C).Its outcomes matched those of the GSE55457's GSVA differential pathway (Fig. 6F).

Analysis of PPI using DERATGs
PPI of 54 DERATGs was assessed using the database STRING, and 47 DERATGs were discovered to have a PPI link, which was as follows: C7, ERAP2, LAP3, RAS-GRP1, SEMA4D, SFRP1, TYMS, LAMP3, NCF1, AIM2,   the network (Fig. 7B).The node degree increases with an increase in DERATGs size.In contrast, the number of edge interfaces increases as line thickness increases.
The cytoHubba tool was used to search hub nodes in the network, and MCC was used to determine the top 20 genes as key gene nodes.The score increases as the node color becomes darker (Fig. 7C).The subnetwork is built using the MCODE, which is used to cluster and build functional modules in the network (Fig. 7D, E), and the construction of the subnetwork reveals the dense areas of potential biological functions.

Construction of TF-targets, miRNA-mRNA network, and RBP-DERATGs correlation analysis
The TRRUST database predicted the TFs of 54 DER-ATGs.Thirty-nine TFs were obtained, corresponding to 25 DERATGs.The network visualized the regulatory relationships (Fig. 8A).The TargetScan database also predicted the miRNAs of DERATGs, and 2310 miRNAs were finally predicted to have regulatory relationships with 51 DERATGs.The regulatory relationships were analyzed by network visualization (Fig. 8B).Finally, RBP genes were extracted from the RBPDB database.Correlation analysis was conducted to observe the correlation between RBP genes and 54 DERATGs in the two datasets (GSE55235 and GSE55457) separately, and the results were displayed as heat maps (Fig. 8C, D).

Immune infiltration analysis
The immune cell infiltration of the RA and the normal group samples in the GSE55235 and the GSE55457 datasets were analyzed based on the CIBERSORT algorithm.The immune infiltration of GSE55235 was analyzed, and a heat map was drawn (Fig. 10A).The unexpressed eosinophils cells were eliminated, and only the cells that were expressed in the sample, such as monocytes, follicular helper T cells, neutrophils, etc., were retained in the heat map.The image shows that plasma cells, naive B cells, etc., had a high infiltration level in the RA group, while resting dendritic cells, activated mast cells, etc., had a low infiltration level.Immune cells from different groups were compared (Fig. 10B), and cells with significant differences (P < 0.05) are displayed in the figure.Plasma cells, resting dendritic cells, gamma delta T cells, mast cells activated, and naive B cells differed significantly from the heat map.For the GSE55457 dataset, the immune infiltration heat map was also drawn (Fig. 10C).
The unexpressed eosinophils cells were eliminated, and only the dendritic cells, monocytes were activated, neutrophils, etc., expressed in the sample were retained in the heat map.However, there was little infiltration of activated mast cells, dendritic cells, and so forth in the RA group.As shown in the figure, the RA group had high levels of infiltration of plasma cells, naive B cells, and others.The differential comparison between groups of the GSE55457 dataset (Fig. 10D) revealed significant differences in the activation of M1 macrophages, plasma cells, naive B cells, follicular helper T cells, activated mast cells, and activated dendritic cells, which was consistent with the results of the GSE55235 dataset.

Subtype construction based on immune infiltration analysis
The immune cell infiltration of the RA and normal group samples in the GSE55235 and the GSE55457 datasets was analyzed using the ssGSEA algorithm.Finally, the expression profiles of immune cells were obtained after the expression profiles were predicted and analyzed using 28 types of immune cell-specific marker genes.The RA samples were divided into two subtypes, Cluster1 and Cluster2, by high and low expression clustering (Fig. 11A,  B).MDSC, eosinophil, activated CD4 + T cell, and other immune cells were highly expressed in Cluster1 but lowly expressed in Cluster2.

Differential analysis of DERATGs between subtypes
The expression of 54 DERATGs in various subtypes was analyzed following the RA subtypes in the GSE55235 and the GSE55457 datasets (Fig. 12A, B).The figure shows the DERATGs that differ significantly (P < 0.05).
The C7 gene was significantly expressed in Cluster2 of the GSE55235 dataset, while other significantly different DERATGs were strongly expressed in Cluster1 of the subtype.The ERAP2 gene was highly expressed in Clus-ter2 of the GSE55457 dataset, whereas other significantly differential DERATGs were highly expressed in the Clus-ter1 subtype.

Discussion
To understand a universal marker assessment is our goal.However, the data sets utilized in this study did not precisely specify factors related to drugs, vaccinations, age, environments, psychology, region, genetics or epidemiology.This does not imply that factors related to these factors did not affect the patients in these two data sets.We removed a few data sets with particular descriptions to prevent bias in this investigation to prevent the proportion of particular subjects from rising.A total of 54 DERATGs were found by comparing the genes expressed in samples from patients with RA and normal groups.These DERATGs were strongly associated with inflammation and immune response.Although the absence of the well-known three RA star molecules TNF, IL6, and JAK in the DERATGs examined in this study, the KEGG enrichment data showed that DER-ATGs were enriched in the TNF signaling pathway.In studying biologically targeted drug therapy, these three molecules are often accompanied by biological processes or signaling pathways rather than being studied individually [28][29][30].In other words, investigating what seems to be a single molecule is investigating the entire signaling pathway, but these star molecules play an undeniably crucial role in the signaling pathway.Moreover, we discovered several key molecules did not exhibit significant differences in expression changes in actual studies (such as GSEA and GSVA pathway analyses in this study).Therefore, the high setting of the gene screening threshold may be why star molecules were absent from this study.
We performed GSVA and GSEA analyses by analyzing all gene expression data to evaluate further RA's complex signature of immune/inflammatory responses.Interestingly, the "Intestinal immune network for IgA production" showed high expression in our study, likely supporting our findings on the GMrepo online database.It has been noted that patients with RA (both new-onset and chronic) either showed IgA-like antibody responses to Prevotella copri (P.copri) or its 27-kDa protein, which are associated with the production of TH17 cell cytokines and the presence of ACPAs [31].Additionally, intestinal tissue samples from patients with RA contain higher IgA antibodies that identify dietary antigens [32].Presently, most RA microbiome studies focus on associations, which aim to link changes in the bacterial composition of the gastrointestinal tract with the condition.Although these findings suggest practical applicability, the mechanism by which gut flora influences the development of RA is still not fully understood [31].Hence, our study may offer the opportunity to adapt more details and references for future research, diagnostics, and therapeutic approaches.
The inflammatory process in RA depends on chemokines.Multivariate discriminant analysis revealed that chemokines CXCL10 and CXCL13 were significantly abundant in the blood plasma of patients with RA compared to healthy volunteers [33].According to an in vitro study, abatacept's (ABT) most likely target molecule in inflamed rheumatoid joints is CXCL10, and serum CXCL10 levels may be a feasible predictor of the therapeutic response to ABT treatment [34].Previous studies have shown that CCR5 DNA variation impacts the degree of RA severity [35] and that CCR5 increases the chemotactic response in the synovial fluid of patients with RA [36].A recent literature review reported that, undoubtedly, CCR5 had gained its place in RA pathogenesis as an important genetic risk factor [37].PTPRC, also known as CD45 in some instances, performs several crucial regulatory functions that regulate cell growth, differentiation, mitosis, and malignant transformation [38].It has been demonstrated to regulate T-and B-cell antigen receptor signaling [39].In our study, the PTPRC gene plays a vital role in this signal transduction network.The PTPRC gene's roles in RA's pathogenesis are currently poorly understood.However, four anti-TNF treatments-TNF-α inhibitors, adalimumab, infliximab, and etanercept-were linked to PTPRC in the drug-gene interaction network, demonstrating that PTPRC is a druggable gene that can be targeted by TNF-α inhibitors, adalimumab, infliximab, and etanercept.Additionally, PTPRC has been demonstrated to be the genetic biomarker of TNFi response most frequently replicated and useful for targeted therapy in patients with RA [40].Unfortunately, neither IL-6 nor JAK inhibitors showed any evidence of a genetic relationship in our study.This supports the idea that biological processes, rather than specific molecules, are currently the focus of drug research.However, the druggable targets selected for this study offer a broad reference point for future research into new targets for old medicines.
In diseases like cancer, autoimmune disease, diabetes, and cardiovascular disease, TFs play a crucial biological role [41].Although most TFs have traditionally been regarded as "undruggable" targets [41], current research has revealed that the tumor therapy drug Binimetinib may have a potential targeted binding impact with NFKB1 [42].Additionally, it has been reported that small molecule inhibitors can specifically target AR, making it the primary treatment target for advanced cancer [43].This is also something that research has found.Combining ketoprofen and indolamide inhibits the Gli1-mediated transcription in the Hedgehog pathway [44].It has been demonstrated that the novel oral active molecular gel WBC100 selectively degrades the protein c-Myc over other proteins and effectively kills cancer cells that overexpress c-Myc [45].Human triple-negative breast and gastric cancer xenografts have been demonstrated to regress in response to WZ-2-033, a new STAT3 inhibitor [46].According to a study, a bromine domain and extra terminal domain inhibitor can induce tumor cell apoptosis by disrupting the specific transcription network that the TCF4 TF regulates [47].However, other TFs in this Fig.11 Rheumatoid arthritis sample subtype analysis.A Heat map of the 28 distinct types of immune cell infiltration in the GSE55235 dataset.Red and green symbolize the groups in Clusters 1 and 2, respectively.B Heat map of immune cell infiltration of 28 different kinds in the GSE55457 dataset.Red and green symbolize the groups in Clusters 1 and 2, respectively study have not consistently been reported to be pharmacologically actionable.In conclusion, although the majority of present work on druggable TFs focuses on cancer drug development, it also offers suggestions for work on RA-related druggable TFs.We believe that NFKB1, AR, GLI1, Myc, STAT3, and TCF4 are now the most potentially druggable TFs based on the regulatory relationship between TFs and DERATGs in this study.
Currently, cell-type deconvolution analysis is frequently applied in RA research.FAS, MAPK8, and TNFSF10 may be associated with alterations in the immune microenvironment in patients with RA, according to a study that used CIBERSORT analysis [48].It was discovered that SLC2A3 is positively associated with the expression of activated mast cells in RA synovial tissue using immune cell infiltration [49].The CIBERSORT study found that the RA key genes CXCL8, CXCL2, and FADD were associated with mast cells, monocytes, activated natural killer cells, CD8 T cells, dormant dendritic cells, and plasma cells [50].In this study, we performed a thorough analysis of the immune infiltration landscape using ssGSEA and CIBERSORT algorithms to quantify the profile of immune infiltration in RA.Studies have shown that 20% of the antibodies mature naive B cells produce when they reach the periphery are still autoreactive.This percentage is significantly higher in patients with RA [51,52].Rituximab, a therapeutic antibody that targets CD20, has been successfully used as a B cell therapy to treat RA.Over the last decade, additional RA studies have suggested that (autoreactive) B cells may contribute to the progression of the disease [53].There has been no research on RA therapy targeting plasma cells, and the function of plasma cells in RA is still unknown [54].
Regarding the involvement of mast cells and dendritic cells (DC cells) in the pathogenesis of RA, conflicting results have also been found.Although most research has focused on the role of mast cells in the pathogenesis of RA [55][56][57][58][59][60] and that immature and activated DC cell populations are present in the synovium of the inflamed joint [61].We found that patients with RA had decreased mast cell and DC cell infiltration in their tissues.However, other studies have suggested that steroid use may be related to decreased mast cells and DC cells in patients with RA [62][63][64].The specific cause of the decline in mast cells and DC cells in RA must be further investigated because it is unknown if the patients in this study were using corticosteroids or other drugs.
Cluster1 and Cluster2, subtypes of expression profiles, were identified based on immune cell expression.The purpose of the analysis was to provide a better understanding of the function and regulatory mechanisms of the immune system.We can better understand the function and regulatory mechanisms of immune cells by understanding the expression forms of each subtype through the analysis of gene expression in subtypes.Determine which subtypes share the most common characteristics to determine the most effective course of action.Additionally, subtype expression profiles can also aid in the discovery of novel therapeutic targets.We might identify potential genes or molecules to be used as therapeutic targets by analyzing the genes expressed in specific subtypes.However, further experimental verification is required.The findings indicate that the PTPRC gene was highly expressed in Cluster1 in the GSE55235 and GSE55457 datasets.PTPRC may be the characteristic gene of the Cluster1 subtype in these two datasets or play a significant biological function that may be directly associated with the function of this subtype.We also noticed discrepancies in the analysis results between the two data sets, which we believe may be due to a batch effect brought on by the differences in data set sample collection location, time and computer sequencing time.Another major limitation of this study is the batch effect.
This study has some other drawbacks.First, the study lacked clinically important details about the condition, such as disease activity and treatment usage.Additionally, no multi-group trials were conducted, and the study's sole focus was on the gene transcriptome.Finally, bioinformatics approaches limited data analysis; preclinical and clinical validation is required.
In conclusion, the scientific community needs to investigate and comprehend how gut microbiota, genetics, and immune inflammation are related to the etiology of RA.The findings of this study might be used as a reference for clinical diagnosis, prognosis, and targeted molecular treatment for RA.

Appendix Table 7 Gene symbol
Figure 1 depicts the workflow of the current study.Figure 2 summarizes the main findings of this study.

Fig. 3
Fig. 3 Data on intestinal microbiota in rheumatoid arthritis.A Prevalence abundance map analysis.B Species co-occurrence network diagram.Green indicates positive, and red indicates negative correlations

Fig. 4
Fig. 4 Differentially expressed genes targeted by RA (DERATGs) screening.A Volcano plot of GSE55235.Red signifies upregulated DEGs, green signifies downregulated DEGs, and blue signifies no DEGs.B Heat map of GSE55235.Blue represents the normal group, and red represents the rheumatoid arthritis (RA) group.C Volcano plot of GSE55457.Red indicates upregulated DEGs, green indicates downregulated DEGs, and blue indicates no DEGs.D Heat map of GSE55457.Blue represents the normal group, and red represents the RA group.E Comprehensive screening of DERATGs by DEGs of the two datasets and GeneCards

Fig. 5
Fig. 5 Differentially expressed genes targeted by RA (DERATGs) functional correlation evaluation.A DERATGs' Gene Ontology (GO) biological function enrichment evaluation.The X-axis represents the enrichment of DERATG in GO entries, and the color of the dots represents the adjusted P-value: redder is displayed when the adjusted P-value is lower, and bluer is displayed when the adjusted P-value is higher.The size of the dots serves as a proxy for the number of enriched genes.B, C Exhibition of DERATGs GO biological function enrichment.D DERATGs' KEGG pathway enrichment study.The size of the dots indicates the number of enriched genes.E, F DERATGs KEGG biological function enrichment exhibition.G DERATGs' enrichment study using Disease Ontology (DO).The x-axis represents the percentage of DERATGs enriched in the disease team, the y-axis displays the names of the enrichment diseases, and the dot color indicates the adjusted P-value: a lower adjusted P-value corresponds to red color, and a higher adjusted P-value corresponds to blue color.The size of the dots serves as a proxy for the number of enriched genes.H, I DERATGs DO enrichment exhibition

Fig. 6
Fig.6 Gene Set Enrichment Analysis and Gene Set Variation Analysis of the GSE55235 and GSE55457 datasets.A Analysis of differential genes' upregulated pathways in the GSE55235 dataset.B Analysis of differential genes' downregulated pathways in the GSE55235 dataset.C Differentially enriched pathways in the GSE55235 dataset.D Analysis of differential genes' upregulated pathways in the GSE55457 dataset.E Analysis of differential genes' downregulated pathways in the GSE55457 dataset.F Differentially enriched pathways in the GSE55457 dataset

Fig. 7
Fig. 7 Protein-protein interaction network analysis of differentially expressed genes targeted by RA (DERATGs).A Data on the number of protein interaction relationships of DERATGs.B Protein interaction network of DERATGs.C. Network diagram of the top 20 hub nodes.D, E Protein interaction network subnetwork construction based on DERATGs, D Module 1 and E Module 2

Fig. 8
Fig. 8 Construction of correlation network and RNA-binding protein (RBP) correlation analysis based on differentially expressed genes targeted by RA (DERATGs).A Transcription factor (TF)-target network for DERATGs.The arrow is directed toward the targeted DERATGs from the predicted TF.B Micro RNA (miRNA)-mRNA network.Red nodes represent DERATGs, green nodes represent related miRNAs, and lines represent the regulatory relationships between DERATGs and miRNAs.C Heat map of the correlation between 54 DERATGs and RBP genes in the expression profile of the GSE55235 dataset.Positively correlated genes are represented by red, whereas negatively correlated genes are represented by blue.D Heat map of the correlation between 54 DERATGs and RBP genes in the expression profile of the GSE55457 dataset.Positively correlated genes are represented by red, whereas negatively correlated genes are represented by blue

Fig. 9
Fig. 9 Construction of interaction networks between drugs and DERATGs targets.A The construction of a target-drug network via the PharmGKB database.B Construction of the target-drug network through the DrugCentral database.Blue indicates DERATGs, and green indicates the drugs predicted by DERATGs

Fig. 10
Fig. 10 Investigation and visual representation of immune cell infiltration.A Heat map of immune infiltration in the GSE55235 dataset shows that rheumatoid arthritis (RA) is indicated by red cells and normal by blue cells.B A boxplot depicting significant differential immune infiltration cells in the GSE55235 dataset is colored red for the RA group and blue for the normal group.C. Heat map of immune infiltration in the GSE55457 dataset; red cells indicate RA and blue cells indicate normal.D A boxplot of the significant differentially immune infiltrating cells of the GSE55457 dataset is shown, with red for the RA group and blue for the normal group.****p < 0.0001,***p < 0.001, **p < 0.01, and *p < 0.05

Fig. 12
Fig. 12 Differential analysis of DERATGs between subtypes.A The GSE55235 dataset expresses 11 DERATGs in two subtypes.Red and green symbolize the groups in Clusters 1 and 2, respectively.B The GSE55457 dataset expresses 16 DERATGs in two subtypes.Red and green symbolize the groups in Clusters 1 and 2, respectively

Table 1
Data information summary

Table 2
GO enrichment summary

Table 3
KEGG enrichment summary

Table 4
DO enrichment summary